Detecting horizontal gene transfer among microbiota: an innovative pipeline for identifying co-shared genes within the mobilome through advanced comparative analysis

ABSTRACT The study presents an innovative pipeline for detecting horizontal gene transfer (HGT) among a collection of sequenced genomes from gut microbiota. Herein, chicken and porcine gut microbiota were analyzed. Based on statistical analysis, we propose that nearly identical genes co-shared between distinct genera can be evidence for a previous event of mobilization of that gene from genome to genome via HGT. Data mining, computational analysis, and network analysis were used to investigate genomes of 452 isolates of chicken or porcine origin to detect genes involved in HGT. The proposed pipeline is user-friendly and includes network visualization. The study highlights that different species and strains of the same genera typically carry different cargo of mobilized genes. The pipeline is capable of identifying not yet characterized genes, as well as genes that are usually co-transferred with genes involved in resistance, virulence, and/or mobilization. Among the analyzed genome collection, the main reservoirs of the HGT genes were found in Phocaeicola spp. (Bacteroidaceae) and UBA9475 spp. (early Pseudoflavonifractor, Oscillospiraceae). Altogether, over 6,000 genes suspected of HGT were identified. Genes associated with intracellular trafficking and secretion and DNA repair were enriched, while genes of unknown and general functions were dominant but not enriched. Only 15 genes were co-shared between Gram-positive and Gram-negative bacteria, mostly genes directly associated with mobilome or antibiotic resistance. However, most HGTs were identified among different genera of the same phylum. Therefore, we suggest that a significant selection pressure exists on gene variants at the phylum level. 
IMPORTANCE
 Horizontal gene transfer (HGT) is a key driver in the evolution of bacterial genomes. The acquisition of genes mediated by HGT may enable bacteria to adapt to ever-changing environmental conditions. Long-term application of antibiotics in intensive agriculture is associated with the dissemination of antibiotic resistance genes among bacteria with the consequences causing public health concern. Commensal farm-animal-associated gut microbiota are considered the reservoir of the resistance genes. Therefore, in this study, we identified known and not-yet characterized mobilized genes originating from chicken and porcine fecal samples using our innovative pipeline followed by network analysis to provide appropriate visualization to support proper interpretation.


Some comments:
the writing/English needs improvement As mentioned, this is an interesting and important topic, and the analyses appear to have been done carefully.However, I found the level at which the findings are presented to be at such a high level that it was difficult to assess their specific impacts.For example, there is a network analysis performed to show the connections among genes in different genomes, but the specific important findings from this analysis are not clear.Perhaps reducing some of the text on the general findings and replacing this with some concrete meaningful examples that could include genome contexts for genes moved among organisms by HGT.Do these also further support the homology-based conclusions for HGT events? the term "co-shared" is redundant -if the genes are shared then they are present in both genomes sp. and spp.should be plain text and not italics line 94 "suspicious suspected of for" needs edited line 100 wording is a bit confusing -are the gut microbiota densely inhabited or the animals?line 109-110 -is there a figure that could be referred to here?line 131 -statistical analysis line 151 -I think "phylogenetically distinct" is meant?(also figure 3 legend) line 186 "main contributors shared genes" needs edited line 235 -I assume amino acid is meant line 265+ text seems to be very repetitive with line 186+ line 336-338 conclusion text -I struggle with this statement -if the genes were identified as having moved by HGT, then they would indeed be capable of being transferred? is there a reason some genes can be transferred and others not? similarly, is the next sentence a conclusion based on data in the manuscript?I don't recall seeing anything so specific in the results.figure 5 legend -would be helpful to include the statistical test performed to generate the given p-values figure 6 -the lines between nodes are difficult to see so suggest they be made darker the authors might want to discuss some of their findings in the context of previous work by Gogarten and colleagues that identifies biases in HGT events, specifically 10.1073/pnas.1001418107 and 10.1038/nrmicro2593 Reviewer #12 (Comments for the Author): The authors present a pipeline for identifying HGT events across bacterial genera.Initially I was very excited about the paper, but there are a few questions that came up for me when reading.First, the authors talk about the importance of using cultureindependent methods to detect HGT, however use cultured isolates; so can this method be used in a culture independent experiment?Second, as written the text makes the pipeline sounds specific to this dataset, can it be applied more broadly?Third, is there a website that hosts the code for this pipeline?If so what is the link?The manuscript in general needs to be proofread for clarity, and the methods need to be described in the results section in brief (for example, which types of phylogenetic trees were constructed, and which statistical tests where used).Finally, I think the authors should describe other methods for detecting HGT (eg.GUBBINs) and compare these established methods to their own.LINE 40: What statistical analysis was used?Describe here or omit this sentence.Line 54: Merge the two sentences on this line, they are related to one another and as written now read as two separate ideas.Line 66: HGT is a form of recombination in some species.Be careful with the phrasing here.Line 72-73: The text here is unclear.Please re-write.Line 76-77: I do not think that you can limit studies on HGT to a single genera, many people are interested in detecting HGT for different applications across genera.Line 78-80: These sentences are unclear and need to be revisited.For example "MGEs and their ability to be further dispersed."Seems to be out of place, please integrate into the prior or following sentence.Line 82-83: Describe what massive antibiotic usage means.All farming?50% of farms, is there a context you can put this in.
Also resistance in gut microbiota is likely also impacted by antibiotic consumption in the human population, not just in relation to farming, please describe here.Line 84: On line 84 the authors talk about the importance of using culture-independent methods to detect HGT, however on line 91 they say that their method has only been used on cultivated isolates.Please resolve this discrepancy.Line 103: There is missing text here.Line 108: What phylogenetic analysis?Describe your methods.WGS?A single gene?Line 130-132: Please describe the statistical analysis conducted.Line 146: what is the limitation due to increasing phylogenetic distance.Describe.Line 148: Define what a genomospecies is.Line 169: I think the listed genomospecies would be better represented as a table, rather than a numbered list in text.Line 186: How do you determine directionality of transfer of HGT genes?Line 261: Merge this paragraph with the previous or following.It is too short to stand on its own.Line 335: Is this pipeline specific to the isolates in your collection, or a generalized method that others could apply?This sentence makes it sound specific to this dataset only.Please carlify.Line 339: Is there a place to download this pipeline for others?Reviewer #2 (Comments for the Author): In this manuscript, the authors describe a pipeline for detecting potential horizontal gene transfer (HGT) among bacterial genomes based on the presence of nearly identical genes co-shared between different taxonomic units.The pipeline involves the identification of coding regions in the genomes, construction of a phylogenetic network, identification of the nearly identical co-shared genes, characterization of their function, and visualization through network and heatmap analysis.The authors demonstrate the use of this pipeline for investigating the genomes of 452 isolates of chicken or porcine origin previously collected by the authors.Overall, I find the manuscript to be interesting and scientifically valid.However, I have the following comments and concerns: 1.The authors describe the pipeline as novel in the title, abstract and text.However, the use of co-shared genes as an indication for HGT is not new, as some examples are cited in the manuscript (e.g., references 17 and 18).While the authors may have used different parameters and specific tools, the pipeline itself is not novel.I would recommend avoiding the use of the word "novel".2. The authors used the criteria of "{greater than or equal to} 99% nucleotide identity over {greater than or equal to} 99% global length alignment" and a minimum length of 300 bp to identify potential HGT.However, according to Groussin et al. 2021 (ref 17), 99% identity over 500 bp may relate to events of HGT from 10,000 years ago.While it is acceptable to use less strict criteria (as explained by the authors in the text), the time scale should be discussed, especially since the authors mention "long-term application of antibiotics in intensive agriculture" and "commensal farm-animal".It would be interesting to know if there are any instances of HGT with 100% identity among the results and to discuss this subgroup of HGT separately.For example, do the resistance genes come from more recent HGT events?3. The description of the bacterial draft genomes used in this study is not clear.In the Methods section the authors state that they used 398 isolates from healthy chicken cecal mass (line 341) and 54 isolates from porcine feces (line 342).However, Table S1 (a list of all isolates used in this study) lists 392 chicken isolates and 60 porcine isolates.The authors refer to their previous studies (19,33), but these references only describe chicken isolates.Additionally, in line 344 they mention the inclusion of "additional 173 genomes" without specifying what these genomes are and how they were collected.The source of the samples might be relevant, especially since the authors claim that the "animals were reared in commercial as well as backyard farms, were of different breed, age and sex, and were fed with food supplemented with probiotics" (lines 137-138), while the references relate to a single sampling.Moreover, since they use cultivated isolates, the choise of cultivation media may affect the outcome.4. In lines 319-320, the authors claim that "only minority (31/6545) of genes suspected to HGT were involved in the resistance mechanism".However, previous studies (e.g., references 17 and 26), including the authors' previous studies (e.g., ref 19), indicate a much higher prevalence of antibiotic resistance genes in the mobilome.Moreover, immediately after, in the paragraph starting in line 321, the authors claim that their findings are consistent with previous studies and that the resistance genes were detected and enriched.This paragraph is unclear, I believe the findings regarding the resistance genes should be further discussed and clarified.If indeed the resistance genes are a minority, this is a very interesting finding.Could it be attributed to the less strict criteria used which may lead to the detection of HGT events prior to the extensive use of antibiotics? 5.In lines 83-88, the authors describe the advantages of using culture-independent techniques, but then they use genome sequences of cultivated bacteria for their study.I think there should be an explanation about the advantages of studying cultured systems.6.The authors emphasize the finding of genes with unknown functions as one of the highlights of their findings.However, such genes are very common, and according to Figure 5, they are actually underrepresented in the co-shared genes compared to their abundance in the entire genomes of the isolates.7.In Figure 1, panel III] it says "identification of nearly identical protein sequences" while the nucleic acids sequences (and not proteins sequences) were used in the pipeline.Please correct.8.In Figure 6, the authors should add a sentence to the figure legend explaining what the nodes and edges represent.9.In line 224 it says "888 genes were co-shared across 8 phyla", whereas Table 1 shows that there are 926 co-shared genes across 8 phyla (and 888 of them are known genes).
10.The paragraph starting in line 321 is not clear.11.In general, I feel that the manuscript would benefit from professional editing, as some parts of it are not coherent and are hard to understand.
Reviewer #4 (Comments for the Author): In this manuscript, the authors describe the use of a bioinformatic analysis pipeline aiming to identify horizontally transferred genes among a collection of genome sequences.The identification of HGT-genes is based on: i) identification of shared genes with high identity and coverage among all genomes, genomospecies, and within higher order taxa, ii) statistical comparison of COG profiles present in the shared gene pool of a determined taxonomic level VS the "all" gene pool, and iii) analysis of the highly similar genes shared by taxa showing an statistically different COG profile from the "All" COG profile.They apply this pipeline on 452 draft genomes obtained from bacterial isolates that belong to pig and chicken intestinal microbiota.
Major comments: 1.It is not clear whether the manuscript is oriented to present the authors' findings about HGT in the pig and chicken microbiota or the utility of their pipeline.The title seems to indicate the second option.However, throughout the manuscript, it is not clear how the applied pipeline compares to other previously published pipelines/software in terms of advantages, limitations, and major applications (Which identity thresholds should be applied?How different identity thresholds affect the results?Are there HGT-genes that are overlooked by the proposed pipeline?Which were those genes?).On the other hand, the finding that nearly identical genes shared by different bacterial genera likely participated in HGT is not a new observation.Please, include a discussion about your findings in terms of the antimicrobial-resistance genes found.2. What are the findings if the genes shared at the Family level are analyzed?Dos the COG categories change o remain the same?IS there an enrichment of a particular COG category likely involved in HGT? 3. The Methods Section needs to be rewritten to be more friendly with the reader in order to provide a guide to follow the proposed pipeline.See an example here: https://www.frontiersin.org/articles/10.3389/fenvs.2022.901917/full.4. Is it possible to provide the command line used by the authors to apply this pipeline?This would provide to be highly helpful for the researchers trying to apply this pipeline.5.In my opinion, contrary to what is stated in the Conclusions, this pipeline is not as intuitive and has many limitations.However, it seems to be very helpful to provide a preliminary set of shared genes that can be further characterized or analyzed in order to assess their involvement in HGT.I believe the conclusions should be centered on this.6.While the title states that the pipeline identify mobilome activity, the results only show the identification of highly identical shared genes annotated as encoding proteins involved in mobilization.In my opinion this cannot be stated as detection of "mobilome activity".7. The introduction needs to be restructured to better address the importance of the detection of HGT in the animal gut microbiota, the current detection pipelines/approaches and the necessity of a pipeline that can do what the authors propose.8. Lines 103-112.Is there missing text here?What were those 8 phyla?Additionally, the findings described in lines 108-112 are not shown in the phylogenetic trees provided, which are colored only by Family.9. Figures require adjustments for a more detailed presentation.a. Fig 2 .Require the identification of the bacterial phyla.Brackets or an additional color ring (since iTOL was used) could be added.b.Fig. 3. Please modify the figure to better convey the information.How were the genomospecies grouped in the 7 phyla?How this "sub-grouping improved and fastened the comparison"?c. Fig. 4. Shared genes between the same or closely related species dominate the heatmap.The genes shared by more distantly related species are difficult to see.Please, correct this.You could use a different color instead of white in the "number of genes" scale that allows you to highlight the shared genes that are currently showed in very light blue color.d.Fig. 5. Please clearly indicate what are the comparisons (which group VS which group, or the subject of the comparison) that resulted in a p<0.05.You could use brackets or lines.Figure 5 could be used to explain how the comparison of COG profiles results in identification of the HGT-genes since this is not clear in the main text.e. Fig. 6.Each color represents a Family, however there are many similar colors, difficulting the visual identification.Please, write the names of important families or Families involved in important findings, near their location in the corresponding networks.10.In Table 1.Is not clear whether the Identified genes suspicious of HGT are from the non-redundant pool (NRPG) or from the 1,235,343 pool.Also, it would be helpful to provide the numbers of the NRPG genes identified.Also, are the "unknown genes" genes of unknown function?Please clarify or correct.11.Lines 219-221.This explanation of 16S identity thresholds belong to the Methods section.12. Methods Section, line 344.The procedence of the bacterial culture collection is unclear.Please detail (or cite and summarize) the procedure of isolation of the bacterial strains.Also, it is unclear if the additional 173 genomes are part of the 452 or not.Please clarify.13.Line 279."We consider co-shared genes between different genera are very likely to be mobilized."It is not clear how the previously presented reasoning supports this conclusion.Please, clarify.14.Methods Section, lines 418-422.It is not clear what is the Dunn's test comparing.Also in Fig. 5.Is it a given COG profile VS the "All" category?Or VS any other category?Moreover, the text does not explain how comparison of COG categories results in identification of HGT genes which is the main focus of the paper.Based on what is written in the Results/Discussion section in Line 239-244, it seems to be a comparison between the COG profile of a given group VS the "All" group.However, this is not understood from Fig. 5 and is not described in the Methods Section.15.Related to the previous comment.How is the statistical analysis carried out?I'm not sure how a categorical COG profile can be compared to another with the Friedman and Dunn tests?Minor comments: Line 34: Is it necessary to write "co-shared"?"Shared" already conveys the idea of all the involved subjects (genomes) having something (gene) in common.Line 46: "only several genes were co-shared between.." Please rephrase to specify how many genes.Lines 66-68: Does HGT is the consequence or the cause of the adaptation/competition capacity of microorganisms?The cited reference does not deal with HGT.Please cite the correct reference supporting the claim.Line 77: How does the conventional methods for HGT detection rely on ... the analysis among multidrug resistant pathogens?The meaning of this sentence is unclear.Lines 78-80: the phrase in these lines seems to be more related to the previous paragraph than the paragraph in which is currently placed.Line 84: Here the utility of culture-independent techniques is presented.However, below it is stated that the work was carried out with isolated bacteria.Please remove the phrase about culture-independent techniques.Line 94: "genes suspicious suspected of for HGT".Please correct.Line 117: Please reference the corresponding figure showing what was described in the text.Line 120: "To identify HGT genes" instead of "traits".Line 131: Please, indicate the statistical analysis used.Line 145: Please, correct the tense of the text, since the past tense here gives the notion that you are describing your findings.However, for the cited reference, you are actually presenting a previous published observation.Line 150: Please, reference the corresponding figure showing what is described in the text.Line 265: The sentence "In total, 6,545 unique genes were co-shared by at least two genera" can be understood as there are two genera that share 6545 genes.Is this correct?Please, clarify or correct.Line 377: how does the evolutionary relationship was estimated before using dRep?Line 388: why a 300 bp threshold was selected?Please, indicate the rationale and include a short discussion (in the discussion section) of how the threshold affects the results.For example, some small genes involved in HGT, such as recombination directionality factors, have ≈200 bp in length.Lines 406-408.Please, specify how the comparison of COG profiles was used to identify HGT genes.Line 431."No chickens have been sacrificed solely for the purpose of this study."Does the "solely" word means that chickens were in fact euthanized but not only for this research project?Please, clarify.Also, assess whether "euthanized" is a more proper word instead of "sacrificed".
Reviewer #6 (Comments for the Author): The manuscript by Schwarzerova et al. introduces a potentially valuable bioinformatic pipeline for detecting laterally transferred genes (LGT) between bacteria.The authors claim that by searching for nearly identical gene sequences in phylogenetically unrelated genomes, they can identify recent cases of LGT and shed light on the emergence of antibiotic resistance.While the research topic is promising, the current version of the manuscript falls short of conveying the true value of the data and requires further attention from the authors.
To strengthen their work, the authors should compare their novel method with existing approaches and analyze a new dataset.It is crucial for the authors to demonstrate the advantages and limitations of their method in comparison to other techniques and showcase what can be uncovered using their pipeline.They could start by applying several existing LGT detection methods to a well-analyzed set of bacterial genomes, followed by implementing their own method.Each identified case should undergo verification through multiple independent statistical techniques, and the significance measures should be compared.Only after this thorough analysis, the authors should introduce their dataset derived from chickens and pigs.
Regarding the pipeline itself, the verification of the detected cases lacks persuasiveness.While the authors employed COG groupings, additional tests are required to ensure the reliability of the results.Furthermore, the manuscript lacks information on how the authors assigned taxonomic names to the 400+ isolates.It is essential for the authors to clarify this aspect.Additionally, reference 33 presents a smaller set of isolates, which raises questions about the identification of bacteria used in the study.
A significant drawback of the manuscript is the combined presentation of Results and Discussions.This approach hinders the reader's ability to extract the true value of the work.It can be acceptable for some descriptive studies but not for a bioinformatic pipeline.The authors should clearly distinguish between these sections and provide a concise summary of their findings.As it stands, it is unclear whether the manuscript primarily focuses on the development of a novel pipeline or the analysis of the mobilomes of gut bacteria.Neither of these topics is adequately described, leading to an overall lack of clarity and limited incremental value of the manuscript.
Minor: the ms text needs revision regarding overstatements and the use of terms Reviewer #7 (Comments for the Author): The manuscript "Detecting Horizontal Gene Transfer: A Novel Methodology Pipeline for Identifying Co-Shared Genes and Mobilome Activity based on Homology Search" describes a methodology pipeline suitable for predicting shared genes with the potential for horizontal gene transfer.The analyzed data set includes 452 genomes of bacterial isolates from the gut microbiota of animals (Gallus gallus and Sus scrofa).This manuscript addresses the current topic of HGT in the gut microbiota of animals.The study and prediction of possible gene transfer within animal breeding are topical due to the increasing numbers of resistant strains in animal breeding, with HGT being the major driver.
It is to be considered whether the manuscript should be intended as a research article and describe and discuss the actual results of the work in more depth and detail than the described methodology pipeline, which uses previously published bioinformatics tools.
If the manuscript remains focused on describing the actual methodology pipeline, It would be appropriate to state in the manuscript: 1.The source code of the methodology pipeline, if available?2. How is the proposed methodology pipeline different from previously published ones?https://www.science.org/doi/full/10.1126/sciadv.abj5056https://doi.org/10.1186/s40168-019-0649-yhttps://doi.org/10.3390/genes11070756 3. What are the weaknesses of the design analysis approach?I have the following other comments and recommendations that I would like the authors to address: line 64: This sentence is not related to the topic.line 167 -183: It is not necessary to list all 18 genomospecies, which do not co-shared any gene, in the text, if this information is shown in Figure 4. line 291: Table 2 -Move the table to the supplementary; this data presentation format is not suitable for the main text.Choose another form of presentation of horizontally transferred genes.Most of the identified genes are carried on mobile genetic elements.It would be interesting to categorize the genes based on which MGEs they are carried on.line 292 -302: Please simplify the text.The information in the main text is duplicated in the legend of Figure 6.In the main text, give the names of individual clusters of orthologous groups (COGs), not their numerical designations, which are incomprehensible to the reader.COG IDs are listed in Table 2.
In general, the main text is difficult to understand.This is due to the fact that the text combines a description of the results and a description of the methodology pipeline.I propose to move the data analysis workflow to the Material and Methodology section and instead focus on the presentation of the results in the results section.The results should be discussed in the broader context of microbiome research.Overall, I propose a major revision of the text.
Reviewer #8 (Comments for the Author): In this work, the authors utilize sequencing data from over 450 isolates across 8 phyla of bacteria isolated from chicken and porcine fecal samples to examine shared gene content.The overall approach is commendable, and the pipeline has the potential to be a useful tool for the research community.However, the authors have provided only limited application of their method and neglected to include biological interpretations of their findings which undermines the work.In addition, there are significant english language edits that need to be addressed throughout the paper.Due to the extent of the revisions required, I have only included line edits below for major concerns.
There is no reason to introduce the term 'genomospecies' throughout the paper.Indicating what level of ANI (or other metric) you utilized to delineate a species and supporting that decision (as you have done through your references) is appropriate and avoids confusion.
Line 34: the study does not propose that 'nearly identical genes co-shared between distinct genera can be mobilized from genome to genome via HGT.'This sentence suggests that the first requirement is for genes to have high identity and this allows HGT to occur.Your study is proposing that you can provide evidence of recent HGT events between distinct genera by identifying these nearly identical genes.
Line 42: You speak of reservoirs of resistance, but have not indicated what these are reservoirs for?Did you find evidence that these organisms are providing AMR genes to pathogens (which I presume are the beneficiaries of these reservoirs?) Line 77 -this is an example, not a method.
Lines 87-89 are irrelevant to your study.
Lines 140-142 What evidence do you have to support that 99% ID allows for past events?Lines 169-185: Why is there a numbered list inserted in your paper?Include this as a proper table or discuss the major findings and move these details to the supplemental.
Lines 199-202: I'm not sure your data can support this statement.Although it is possible that bacteria are moving rapidly between different areas, it is also possible that the genes detected are under strong selection pressure to preserve their sequence identity or that similar selection pressures in different environments are promoting their acquisition from local reservoirs.
Line 204-210: Does this refer to reciprocal sharing between these species or the fraction of putatively shared genes per genome?
Lines 296-305: These are obvious choices for network analysis, but was there a systematic method to choosing genes or was it arbitrary?
Lines 317-325: your analysis indicated that AMR genes (but not the mobile elements moving them) are highly identical in Grampositive and Gram-negative, but this suggests there is a biological explanation related to AMR that is separate from recent HGT acquisition.If these genes were identical because they had recently been acquired then you should also see evidence of their acquisition (transposons, IS elements etc.) that were also highly identical, or provide literature to support rapid loss of these MGEs.
Lines 340-341: "Importantly, the pipeline reveals new findings regarding not yet characterized genes, genes usually cotransferred with genes involved in resistance, virulence and/or mobilome activity."I don't doubt that your pipeline is capable of this, and I agree that it is a very useful tool for the research community.But you have not taken the time in this paper to show the application of this pipeline to these questions.Which hypothetical genes did you identify that should be investigated further?What evidence did you generate for the mechanism of AMR gene movement between diverse species?You identify some species with no shared genes and some that you consider reservoirs but do not discuss the biological implications of these findings or give any insight into possible reasons.You have identified genomes with co-shared genes but have not included any investigation of the nature of these co-shared genes or how this information can be used in future studies.
Reviewer #9 (Comments for the Author): Review Schwarzerova et al -Dectecting HGT -novel methodology pipeline -June 2023 This paper proposes a new in silico methodology to detect HGT among complex population, through NGS and selection of coshared genes between "genomospecies".I believe this method is a geat interest for the community.However it is important to note that the threadhold used here (>99% of nt identity and >99% length alignment), even if less stringent than previous studies, is still restrictive.As such, it only includes genes detectable using this limitation, but it excludes any gene that could be initially absent in the recipient cell (for example within MGEs), or genes that could be located between two other genes which present sufficient sequence identity to support homology recombination.Thus, I believe the authors should indicate that their study is not exhaustive at all to dectect HGT, as soon as within the abstract.Another surprising point for me is the indication that this pipeline is "intuitive, easy to use" (l.339).I believe if you are not trained in bioinformatics, it is not easy to use at all.To the best of my reading of the manuscript, the authors do not provide here a website or an online platform where each microbiologist over the world could load a set of genomes, looking for HGT between their genomes using this pipeline.This would really be easy to use.I do not mean that this would be a request as referee, however, please do not think that all microbiologists are computer scientists.Be careful L103-107 are empty lines!!! L120-132: in my opinion, this crucial part of the paper is not so clear for microbiologists who are not computer scientists.For example, I did not understand what is dREP?I though it would be explained later, but it is not explained either in l.147.The paper should be easily understandable for any microbiologist.In the same line, in Figure 1 legend, please provide the meaning of each acronym.Some are understandable (RAxML, ClustalOmega), but some are not so used to my knowledge (UBCG, iTOL, dRep, eggnog-mapper...).Perhaps each person would be familiar with different acronyms, but not all of them.So please make the figure comprehensible by the legend.L137 the reader learn important things about the collection used (collection over 5 years...).I believe the collection should be presented in more details in the section "Bacterial diversity of animal gut ..." L199-202: the observation of HGT between strains from chicken gut and porcine gut is interesting.However, these HGT cannot be direct from chicken gut bacteria to porcine gut bacteria.Why the authors did not discuss the probability that one or several intermediate HGT would have occurred?Other minor comments: -L54: I think that the misuse of antibiotics in human medicine is also responsible for the dissemination of AMR, leading to well known nosocomial infections.-L55-56: Gut microbiota is not the only important reservoir of AMR genes.
-L94: please rephrase -L98: I think the authors should add "analysed in this work" at the end of the title of this section.The authors did not analysed the animal gut microbiota of all animals, nor in all countries... -L111 why "also"?Does another phylogenetic analysis is involved here?(unless 16S rDNA) -L129 please explain CD-HIT -L130 please explain eggNOG-mapper -L169-185 why the authors did not provide this list as a Table ?-L283: to assess -L324 I guess the authors means antibiotic resistance, or any kind of resistance including to heavy metals, endonucleases and other bacteriocins?-L335 please replace "the collection" by "our collection" as this is a specific study based on a specific collection.-L337 comprise genes which are capable to be... -Reference number 10 should be replaced by another one which have been peer reviewed.I do not think that such reference from Internet have been peer reviewed.
-Figure 3 legend please identify 1 to 7 in Legend -Figure S1 the phylogeny could be enlarged -Figure 2 and S1, it could be nice to indicate the different phyla on the left of the colored ranges.

Preparing Revision Guidelines
To submit your modified manuscript, log onto the eJP submission site at https://spectrum.msubmit.net/cgi-bin/main.plex.Go to Author Tasks and click the appropriate manuscript title to begin the revision process.The information that you entered when you first submitted the paper will be displayed.Please update the information as necessary.Here are a few examples of required updates that authors must address: • Point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER.
• Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file.
• Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file.For complete guidelines on revision requirements, please see the journal Submission and Review Process requirements at https://journals.asm.org/journal/Spectrum/submission-review-process.Submissions of a paper that does not conform to Microbiology Spectrum guidelines will delay acceptance of your manuscript." Please return the manuscript within 60 days; if you cannot complete the modification within this time period, please contact me.If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by Microbiology Spectrum.
If your manuscript is accepted for publication, you will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail.Arrangements for payment must be made before your article is published.For a complete list of Publication Fees, including supplemental material costs, please visit our website.
Corresponding authors may join or renew ASM membership to obtain discounts on publication fees.Need to upgrade your membership level?Please contact Customer Service at Service@asmusa.org.
Thank you for submitting your paper to Microbiology Spectrum.

Pipeline for Identifying Co-Shared Genes and Mobilome Activity based on Homology Search".
In this manuscript, the authors describe a pipeline for detecting potential horizontal gene transfer (HGT) among bacterial genomes based on the presence of nearly identical genes co-shared between different taxonomic units.The pipeline involves the identification of coding regions in the genomes, construction of a phylogenetic network, identification of the nearly identical co-shared genes, characterization of their function, and visualization through network and heatmap analysis.The authors demonstrate the use of this pipeline for investigating the genomes of 452 isolates of chicken or porcine origin previously collected by the authors.
Overall, I find the manuscript to be interesting and scientifically valid.However, I have the following comments and concerns: 1.The authors describe the pipeline as novel in the title, abstract and text.However, the use of coshared genes as an indication for HGT is not new, as some examples are cited in the manuscript (e.g., references 17 and 18).While the authors may have used different parameters and specific tools, the pipeline itself is not novel.I would recommend avoiding the use of the word "novel".2. The authors used the criteria of "≥ 99% nucleotide identity over ≥ 99% global length alignment" and a minimum length of 300 bp to identify potential HGT.However, according to Groussin et al. 2021 (ref 17), 99% identity over 500 bp may relate to events of HGT from 10,000 years ago.
While it is acceptable to use less strict criteria (as explained by the authors in the text), the time scale should be discussed, especially since the authors mention "long-term application of antibiotics in intensive agriculture" and "commensal farm-animal".
It would be interesting to know if there are any instances of HGT with 100% identity among the results and to discuss this sub-group of HGT separately.For example, do the resistance genes come from more recent HGT events?3. The description of the bacterial draft genomes used in this study is not clear.In the Methods section the authors state that they used 398 isolates from healthy chicken cecal mass (line 341) and 54 isolates from porcine feces (line 342).However, Table S1 (a list of all isolates used in this study) lists 392 chicken isolates and 60 porcine isolates.The authors refer to their previous studies (19,33), but these references only describe chicken isolates.Additionally, in line 344 they mention the inclusion of "additional 173 genomes" without specifying what these genomes are and how they were collected.The source of the samples might be relevant, especially since the authors claim that the "animals were reared in commercial as well as backyard farms, were of different breed, age and sex, and were fed with food supplemented with probiotics" (lines 137-138), while the references relate to a single sampling.Moreover, since they use cultivated isolates, the choise of cultivation media may affect the outcome.4. In lines 319-320, the authors claim that "only minority (31/6545) of genes suspected to HGT were involved in the resistance mechanism".However, previous studies (e.g., references 17 and 26), including the authors' previous studies (e.g., ref 19), indicate a much higher prevalence of antibiotic resistance genes in the mobilome.Moreover, immediately after, in the paragraph starting in line 321, the authors claim that their findings are consistent with previous studies and that the resistance genes were detected and enriched.This paragraph is unclear, I believe the findings regarding the resistance genes should be further discussed and clarified.If indeed the resistance genes are a minority, this is a very interesting finding.Could it be attributed to the less strict criteria used which may lead to the detection of HGT events prior to the extensive use of antibiotics?
5. In lines 83-88, the authors describe the advantages of using culture-independent techniques, but then they use genome sequences of cultivated bacteria for their study.I think there should be an explanation about the advantages of studying cultured systems.6.The authors emphasize the finding of genes with unknown functions as one of the highlights of their findings.However, such genes are very common, and according to Figure 5, they are actually underrepresented in the co-shared genes compared to their abundance in the entire genomes of the isolates.7.In Figure 1, panel III] it says "identification of nearly identical protein sequences" while the nucleic acids sequences (and not proteins sequences) were used in the pipeline.Please correct.8.In Figure 6, the authors should add a sentence to the figure legend explaining what the nodes and edges represent.9.In line 224 it says "888 genes were co-shared across 8 phyla", whereas Table 1 shows that there are 926 co-shared genes across 8 phyla (and 888 of them are known genes).10.The paragraph starting in line 321 is not clear.11.In general, I feel that the manuscript would benefit from professional editing, as some parts of it are not coherent and are hard to understand.
In this manuscript, the authors describe the use of a bioinforma c analysis pipeline aiming to iden fy horizontally transferred genes among a collec on of genome sequences.The iden fica on of HGT-genes is based on: i) iden fica on of shared genes with high iden ty and coverage among all genomes, genomospecies, and within higher order taxa, ii) sta s cal comparison of COG profiles present in the shared gene pool of a determined taxonomic level VS the "all" gene pool, and iii) analysis of the highly similar genes shared by taxa showing an sta s cally different COG profile from the "All" COG profile.They apply this pipeline on 452 dra genomes obtained from bacterial isolates that belong to pig and chicken intes nal microbiota.
Major comments: 1.It is not clear whether the manuscript is oriented to present the authors' findings about HGT in the pig and chicken microbiota or the u lity of their pipeline.The tle seems to indicate the second op on.However, throughout the manuscript, it is not clear how the applied pipeline compares to other previously published pipelines/so ware in terms of advantages, limita ons, and major applica ons (Which iden ty thresholds should be applied?How different iden ty thresholds affect the results?Are there HGT-genes that are overlooked by the proposed pipeline?Which were those genes?).On the other hand, the finding that nearly iden cal genes shared by different bacterial genera likely par cipated in HGT is not a new observa on.Please, include a discussion about your findings in terms of the an microbial-resistance genes found.2. What are the findings if the genes shared at the Family level are analyzed?Dos the COG categories change o remain the same?IS there an enrichment of a par cular COG category likely involved in HGT? 3. The Methods Sec on needs to be rewri en to be more friendly with the reader in order to provide a guide to follow the proposed pipeline.See an example here: h ps://www.fronersin.org/arcles/10.3389/fenvs.2022.901917/full.4. Is it possible to provide the command line used by the authors to apply this pipeline?This would provide to be highly helpful for the researchers trying to apply this pipeline.5.In my opinion, contrary to what is stated in the Conclusions, this pipeline is not as intui ve and has many limita ons.However, it seems to be very helpful to provide a preliminary set of shared genes that can be further characterized or analyzed in order to assess their involvement in HGT.I believe the conclusions should be centered on this.6.While the tle states that the pipeline iden fy mobilome ac vity, the results only show the iden fica on of highly iden cal shared genes annotated as encoding proteins involved in mobiliza on.In my opinion this cannot be stated as detec on of "mobilome ac vity".7. The introduc on needs to be restructured to be er address the importance of the detec on of HGT in the animal gut microbiota, the current detec on pipelines/approaches and the necessity of a pipeline that can do what the authors propose.8. Lines 103-112.Is there missing text here?What were those 8 phyla?Addi onally, the findings described in lines 108-112 are not shown in the phylogene c trees provided, which are colored only by Family.9. Figures require adjustments for a more detailed presenta on.a. Fig 2 .Require the iden fica on of the bacterial phyla.Brackets or an addi onal color ring (since iTOL was used) could be added.b.Fig. 3. Please modify the figure to be er convey the informa on.How were the genomospecies grouped in the 7 phyla?How this "sub-grouping improved and fastened the comparison"?c. Fig. 4. Shared genes between the same or closely related species dominate the heatmap.The genes shared by more distantly related species are difficult to see.Please, correct this.You could use a different color instead of white in the "number of genes" scale that allows you to highlight the shared genes that are currently showed in very light blue color.d.Fig. 5. Please clearly indicate what are the comparisons (which group VS which group, or the subject of the comparison) that resulted in a p<0.05.You could use brackets or lines.Figure 5 could be used to explain how the comparison of COG profiles results in iden fica on of the HGT-genes since this is not clear in the main text.e. Fig. 6.Each color represents a Family, however there are many similar colors, difficul ng the visual iden fica on.Please, write the names of important families or Families involved in important findings, near their loca on in the corresponding networks.10.In Table 1.Is not clear whether the Iden fied genes suspicious of HGT are from the nonredundant pool (NRPG) or from the 1,235,343 pool.Also, it would be helpful to provide the numbers of the NRPG genes iden fied.Also, are the "unknown genes" genes of unknown func on?Please clarify or correct.11.Lines 219-221.This explana on of 16S iden ty thresholds belong to the Methods sec on.12. Methods Sec on, line 344.The procedence of the bacterial culture collec on is unclear.
Please detail (or cite and summarize) the procedure of isola on of the bacterial strains.Also, it is unclear if the addi onal 173 genomes are part of the 452 or not.Please clarify.13.Line 279."We consider co-shared genes between different genera are very likely to be mobilized."It is not clear how the previously presented reasoning supports this conclusion.Please, clarify.14.Methods Sec on, lines 418-422.It is not clear what is the Dunn's test comparing.Also in Fig. 5.Is it a given COG profile VS the "All" category?Or VS any other category?Moreover, the text does not explain how comparison of COG categories results in iden fica on of HGT-genes which is the main focus of the paper.Based on what is wri en in the Results/Discussion sec on in Line 239-244, it seems to be a comparison between the COG profile of a given group VS the "All" group.However, this is not understood from Fig. 5 and is not described in the Methods Sec on.15.Related to the previous comment.How is the sta s cal analysis carried out?I'm not sure how a categorical COG profile can be compared to another with the Friedman and Dunn tests?
Minor comments: Line 34: Is it necessary to write "co-shared"?"Shared" already conveys the idea of all the involved subjects (genomes) having something (gene) in common.
Line 46: "only several genes were co-shared between.." Please rephrase to specify how many genes.
Lines 66-68: Does HGT is the consequence or the cause of the adapta on/compe on capacity of microorganisms?The cited reference does not deal with HGT.Please cite the correct reference suppor ng the claim.
Line 77: How does the conven onal methods for HGT detec on rely on … the analysis among mul drug resistant pathogens?The meaning of this sentence is unclear.
Lines 78-80: the phrase in these lines seems to be more related to the previous paragraph than the paragraph in which is currently placed.
Line 84: Here the u lity of culture-independent techniques is presented.However, below it is stated that the work was carried out with isolated bacteria.Please remove the phrase about culture-independent techniques.
Line 117: Please reference the corresponding figure showing what was described in the text.
Line 131: Please, indicate the sta s cal analysis used.
Line 145: Please, correct the tense of the text, since the past tense here gives the no on that you are describing your findings.However, for the cited reference, you are actually presen ng a previous published observa on.Line 265: The sentence "In total, 6,545 unique genes were co-shared by at least two genera" can be understood as there are two genera that share 6545 genes.Is this correct?Please, clarify or correct.
Line 377: how does the evolu onary rela onship was es mated before using dRep?
Line 388: why a 300 bp threshold was selected?Please, indicate the ra onale and include a short discussion (in the discussion sec on) of how the threshold affects the results.For example, some small genes involved in HGT, such as recombina on direc onality factors, have ≈200 bp in length.
Lines 406-408.Please, specify how the comparison of COG profiles was used to iden fy HGT genes.
Line 431."No chickens have been sacrificed solely for the purpose of this study."Does the "solely" word means that chickens were in fact euthanized but not only for this research project?Please, clarify.Also, assess whether "euthanized" is a more proper word instead of "sacrificed".

Review Schwarzerova et al -Dectecting HGT -novel methodology pipeline -June 2023
This paper proposes a new in silico methodology to detect HGT among complex population, through NGS and selection of co-shared genes between "genomospecies".
I believe this method is a geat interest for the community.However it is important to note that the threadhold used here (>99% of nt identity and >99% length alignment), even if less stringent than previous studies, is still restrictive.As such, it only includes genes detectable using this limitation, but it excludes any gene that could be initially absent in the recipient cell (for example within MGEs), or genes that could be located between two other genes which present sufficient sequence identity to support homology recombination.Thus, I believe the authors should indicate that their study is not exhaustive at all to dectect HGT, as soon as within the abstract.
Another surprising point for me is the indication that this pipeline is "intuitive, easy to use" (l.339).I believe if you are not trained in bioinformatics, it is not easy to use at all.To the best of my reading of the manuscript, the authors do not provide here a website or an online platform where each microbiologist over the world could load a set of genomes, looking for HGT between their genomes using this pipeline.This would really be easy to use.I do not mean that this would be a request as referee, however, please do not think that all microbiologists are computer scientists.
Be careful L103-107 are empty lines!!! L120-132: in my opinion, this crucial part of the paper is not so clear for microbiologists who are not computer scientists.For example, I did not understand what is dREP?I though it would be explained later, but it is not explained either in l.147.The paper should be easily understandable for any microbiologist.
In the same line, in Figure 1 legend, please provide the meaning of each acronym.Some are understandable (RAxML, ClustalOmega), but some are not so used to my knowledge (UBCG, iTOL, dRep, eggnog-mapper…).Perhaps each person would be familiar with different acronyms, but not all of them.So please make the figure comprehensible by the legend.
L137 the reader learn important things about the collection used (collection over 5 years…).I believe the collection should be presented in more details in the section "Bacterial diversity of animal gut …" L199-202: the observation of HGT between strains from chicken gut and porcine gut is interesting.However, these HGT cannot be direct from chicken gut bacteria to porcine gut bacteria.Why the authors did not discuss the probability that one or several intermediate HGT would have occurred?
Other minor comments: -L54: I think that the misuse of antibiotics in human medicine is also responsible for the dissemination of AMR, leading to well known nosocomial infections.-L55-56: Gut microbiota is not the only important reservoir of AMR genes.-L94: please rephrase -L98: I think the authors should add "analysed in this work" at the end of the title of this section.The authors did not analysed the animal gut microbiota of all animals, nor in all countries… -L111 why "also"?Does another phylogenetic analysis is involved here?(unless 16S rDNA) -L129 please explain CD-HIT -L130 please explain eggNOG-mapper -L169-185 why the authors did not provide this list as a Table ?-L283: to assess -L324 I guess the authors means antibiotic resistance, or any kind of resistance including to heavy metals, endonucleases and other bacteriocins?-L335 please replace "the collection" by "our collection" as this is a specific study based on a specific collection.-L337 comprise genes which are capable to be… -Reference number 10 should be replaced by another one which have been peer reviewed.I do not think that such reference from Internet have been peer reviewed.-Figure 3 legend please identify 1 to 7 in Legend -Figure S1 the phylogeny could be enlarged -Figure 2 and S1, it could be nice to indicate the different phyla on the left of the colored ranges.
The authors present a pipeline for identifying HGT events across bacterial genera.Initially I was very excited about the paper, but there are a few questions that came up for me when reading.First, the authors talk about the importance of using culture-independent methods to detect HGT, however use cultured isolates; so can this method be used in a culture independent experiment?Second, as written the text makes the pipeline sounds specific to this dataset, can it be applied more broadly?Third, is there a website that hosts the code for this pipeline?If so what is the link?The manuscript in general needs to be proofread for clarity, and the methods need to be described in the results section in brief (for example, which types of phylogenetic trees were constructed, and which statistical tests where used).Finally, I think the authors should describe other methods for detecting HGT (eg.GUBBINs) and compare these established methods to their own.LINE 40: What statistical analysis was used?Describe here or omit this sentence.Line 54: Merge the two sentences on this line, they are related to one another and as written now read as two separate ideas.Line 66: HGT is a form of recombination in some species.Be careful with the phrasing here.Line 72-73: The text here is unclear.Please re-write.Line 76-77: I do not think that you can limit studies on HGT to a single genera, many people are interested in detecting HGT for different applications across genera.Line 78-80: These sentences are unclear and need to be revisited.For example "MGEs and their ability to be further dispersed."Seems to be out of place, please integrate into the prior or following sentence.Line 82-83: Describe what massive antibiotic usage means.All farming?50% of farms, is there a context you can put this in.Also resistance in gut microbiota is likely also impacted by antibiotic consumption in the human population, not just in relation to farming, please describe here.Line 84: On line 84 the authors talk about the importance of using culture-independent methods to detect HGT, however on line 91 they say that their method has only been used on cultivated isolates.Please resolve this discrepancy.Line 103: There is missing text here.Line 186: How do you determine directionality of transfer of HGT genes?Line 261: Merge this paragraph with the previous or following.It is too short to stand on its own.Line 335: Is this pipeline specific to the isolates in your collection, or a generalized method that others could apply?This sentence makes it sound specific to this dataset only.Please carlify.Line 339: Is there a place to download this pipeline for others?
line 102 -109: There are missing eight phyla mentioned in the text; the taxonomy given in the whole text does not respect the currently valid taxonomy of prokaryotes, please modify according to https://doi.org/10.1099/ijsem.0.005056 line 107: Please clarify, what are the 81 conserved genes, on the basis of what criteria were they selected?line 111: Fig S1 -It is not clear what is the main message of this figure.It would be useful to highlight the families belonging to each phylum mentioned in the text.line 152: Figure 3 -I suggest moving Figure 3 to the supplementary material.
• Manuscript: A .DOC version of the revised manuscript • Figures: Editable, high-resolution, individual figure files are required at revision, TIFF or EPS files are preferred

Line 150 :
Please, reference the corresponding figure showing what is described in the text.
Line 108: What phylogenetic analysis?Describe your methods.WGS?A single gene?Line 130-132: Please describe the statistical analysis conducted.Line 146: what is the limitation due to increasing phylogenetic distance.Describe.Line 148: Define what a genomospecies is.Line 169: I think the listed genomospecies would be better represented as a table, rather than a numbered list in text.